This week problem set will explore behavior of support vector classifiers and SVMs (following the distinction made in ISLR) on WiFi localization dataset from UCI ML archive. We worked with it on multiple occasions before (most recently two weeks ago evaluating performance of logistic regression, discriminant analysis and KNN on it). As two weeks ago we are going to convert the four-levels outcome in the data file to the binary one indicating localization at the third location:
wifiLocDat <- read.table("wifi_localization.txt",sep="\t")
colnames(wifiLocDat) <- c(paste0("WiFi",1:7),"Loc")
ggpairs(wifiLocDat,aes(colour=factor(Loc)))
wifiLocDat[,"Loc3"] <- factor(wifiLocDat[,"Loc"]==3)
wifiLocDat <- wifiLocDat[,colnames(wifiLocDat)!="Loc"]
dim(wifiLocDat)
## [1] 2000 8
summary(wifiLocDat)
## WiFi1 WiFi2 WiFi3 WiFi4
## Min. :-74.00 Min. :-74.00 Min. :-73.00 Min. :-77.00
## 1st Qu.:-61.00 1st Qu.:-58.00 1st Qu.:-58.00 1st Qu.:-63.00
## Median :-55.00 Median :-56.00 Median :-55.00 Median :-56.00
## Mean :-52.33 Mean :-55.62 Mean :-54.96 Mean :-53.57
## 3rd Qu.:-46.00 3rd Qu.:-53.00 3rd Qu.:-51.00 3rd Qu.:-46.00
## Max. :-10.00 Max. :-45.00 Max. :-40.00 Max. :-11.00
## WiFi5 WiFi6 WiFi7 Loc3
## Min. :-89.00 Min. :-97.00 Min. :-98.00 FALSE:1500
## 1st Qu.:-69.00 1st Qu.:-86.00 1st Qu.:-87.00 TRUE : 500
## Median :-64.00 Median :-82.00 Median :-83.00
## Mean :-62.64 Mean :-80.98 Mean :-81.73
## 3rd Qu.:-56.00 3rd Qu.:-77.00 3rd Qu.:-78.00
## Max. :-36.00 Max. :-61.00 Max. :-63.00
head(wifiLocDat)
## WiFi1 WiFi2 WiFi3 WiFi4 WiFi5 WiFi6 WiFi7 Loc3
## 1 -64 -56 -61 -66 -71 -82 -81 FALSE
## 2 -68 -57 -61 -65 -71 -85 -85 FALSE
## 3 -63 -60 -60 -67 -76 -85 -84 FALSE
## 4 -61 -60 -68 -62 -77 -90 -80 FALSE
## 5 -63 -65 -60 -63 -77 -81 -87 FALSE
## 6 -64 -55 -63 -66 -76 -88 -83 FALSE
Here we will use SVM implementation available in library e1071 to fit classifiers with linear and radial (polynomial for extra points) kernels and compare their relative performance as well as to that of random forest and KNN.
Use svm from library e1071 with kernel="linear" to fit classifier (e.g. ISLR Ch.9.6.1) to the entire WiFi localization dataset setting parameter cost to 0.001, 1, 1000 and 1 mln. Describe how this change in parameter cost affects model fitting process (hint: the difficulty of the underlying optimization problem increases with cost – can you explain what drives it?) and its outcome (how does the number of support vectors change with cost?) and what are the implications of that. Explain why change in cost value impacts number of support vectors found. (Hint: there is an answer in ISLR.) Use tune function from library e1071 (see ISLR Ch.9.6.1 for details and examples of usage) to determine approximate value of cost (in the range between 0.1 and 100 – the suggested range spanning ordes of magnitude should hint that the density of the grid should be approximately logarithmic – e.g. 1, 3, 10, … or 1, 2, 5, 10, … etc.) that yields the lowest error in cross-validation employed by tune. Setup a resampling procedure repeatedly splitting entire dataset into training and test, using training data to tune cost value and test dataset to estimate classification error. Report and discuss distributions of test errors from this procedure and selected values of cost.
#svm
c = 0.001
svmfit = svm(Loc3~.,data=wifiLocDat,kernel="linear",cost=c,scale=FALSE)
summary(svmfit)
##
## Call:
## svm(formula = Loc3 ~ ., data = wifiLocDat, kernel = "linear", cost = c,
## scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.001
##
## Number of Support Vectors: 1005
##
## ( 505 500 )
##
##
## Number of Classes: 2
##
## Levels:
## FALSE TRUE
P = predict(svmfit)
table(acutal=wifiLocDat$Loc3,predicted=P)
## predicted
## acutal FALSE TRUE
## FALSE 1500 0
## TRUE 500 0
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(wifiLocDat$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
## Total cases that are not NA: 2000
## Correct predictions (accuracy): 1500(75%)
## TP, TN, FP, FN, P, N: 0 1500 0 500 500 1500
## TPR (sensitivity)=TP/P: 0%
## TNR (specificity)=TN/N: 100%
## PPV (precision)=TP/(TP+FP): NaN%
## FDR (false discovery)=1-PPV: NaN%
## FPR =FP/N=1-TNR: 0%
#plot(svmfit,wifiLocDat,WiFi3~WiFi5)
With cost of 0.001, the model predicts all points as false, so it has 0% sensitivity. This is not a good model. The low cost allows for very large margin, so very high misclassification is tolerated. The large margin also means that more points lie on this margin, and so they are support vectors. The number of support vectors are high, at 1005, about half the dataset. The plots show that all the red points, which are actual true, are X support vectors, which are misclassified as FALSE.
op=par(mfrow=c(1,1))
plt=NULL
for(m in names(wifiLocDat)[1:7]){
for(n in names(wifiLocDat)[1:7]){
if(m != n){
a=as.name(m)
b=as.name(n)
c=as.formula(paste0(a, "~", b))
#plot(cbind(wifiLocDat[m],wifiLocDat[n]))
plt = c(plt,plot(svmfit,wifiLocDat,c))
}
}
}
plt
## NULL
par(op)
Plotting of some axis shows the SVM boundary, like WiFi5 and WiFi6. However, they don’t seem to match the results, that all points are predicted FALSE. Probably, the boundary shown is only see a slice of the hyperplane, which is multidimensional. As such , some of the plots do not show the boundary at all, so the hyperplane did not intersect across those axis values.
par(mfrow=c(2,2))
#plot(svmfit,wifiLocDat,WiFi1~WiFi2)
#plot(svmfit,wifiLocDat,WiFi1~WiFi2)
#plot(svmfit,wifiLocDat,WiFi1~WiFi2)
#plot(svmfit,wifiLocDat,WiFi1~WiFi2)
#plot(cbind(wifiLocDat["WiFi1"],wifiLocDat["WiFi2"]))
#plot(cbind(wifiLocDat["WiFi1"],wifiLocDat["WiFi2"]))
#plot(cbind(wifiLocDat["WiFi1"],wifiLocDat["WiFi2"]))
#plot(cbind(wifiLocDat["WiFi1"],wifiLocDat["WiFi2"]))
par(op)
#svm
c = 1
svmfit = svm(Loc3~.,data=wifiLocDat,kernel="linear",cost=c,scale=FALSE)
summary(svmfit)
##
## Call:
## svm(formula = Loc3 ~ ., data = wifiLocDat, kernel = "linear", cost = c,
## scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 1
##
## Number of Support Vectors: 999
##
## ( 500 499 )
##
##
## Number of Classes: 2
##
## Levels:
## FALSE TRUE
P = predict(svmfit)
table(acutal=wifiLocDat$Loc3,predicted=P)
## predicted
## acutal FALSE TRUE
## FALSE 1435 65
## TRUE 371 129
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(wifiLocDat$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
## Total cases that are not NA: 2000
## Correct predictions (accuracy): 1564(78.2%)
## TP, TN, FP, FN, P, N: 129 1435 65 371 500 1500
## TPR (sensitivity)=TP/P: 25.8%
## TNR (specificity)=TN/N: 95.7%
## PPV (precision)=TP/(TP+FP): 66.5%
## FDR (false discovery)=1-PPV: 33.5%
## FPR =FP/N=1-TNR: 4.33%
With cost set at 1, the model is improved over the 0.001 cost model. The number of support vectors are reduced, but it is still high at roughly half the data set and only very slightly down compared to the 0.001 model. The false negative is higher than the true negative, so it is not very good at predicting true positive.
#svm
c = 1000
svmfit = svm(Loc3~.,data=wifiLocDat,kernel="linear",cost=c,scale=FALSE)
summary(svmfit)
##
## Call:
## svm(formula = Loc3 ~ ., data = wifiLocDat, kernel = "linear", cost = c,
## scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 1000
##
## Number of Support Vectors: 295
##
## ( 147 148 )
##
##
## Number of Classes: 2
##
## Levels:
## FALSE TRUE
P = predict(svmfit)
table(acutal=wifiLocDat$Loc3,predicted=P)
## predicted
## acutal FALSE TRUE
## FALSE 931 569
## TRUE 179 321
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(wifiLocDat$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
## Total cases that are not NA: 2000
## Correct predictions (accuracy): 1252(62.6%)
## TP, TN, FP, FN, P, N: 321 931 569 179 500 1500
## TPR (sensitivity)=TP/P: 64.2%
## TNR (specificity)=TN/N: 62.1%
## PPV (precision)=TP/(TP+FP): 36.1%
## FDR (false discovery)=1-PPV: 63.9%
## FPR =FP/N=1-TNR: 37.9%
With cost set to 1000, the number of support vectors are greatly reduced, to about 10% of the data. The runtime is much more than the 1 and 0.001 cost models. A higher cost means that the model has to search harder to find a boundary with a small margin that meets the cost requirements. Unlike the 1 cost model, which has excessive false negatives, this 1000 cost model has excessive false positives.
#svm
c = 1000000
svmfit = svm(Loc3~.,data=wifiLocDat,kernel="linear",cost=c,scale=FALSE)
summary(svmfit)
##
## Call:
## svm(formula = Loc3 ~ ., data = wifiLocDat, kernel = "linear", cost = c,
## scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 1e+06
##
## Number of Support Vectors: 105
##
## ( 40 65 )
##
##
## Number of Classes: 2
##
## Levels:
## FALSE TRUE
P = predict(svmfit)
table(acutal=wifiLocDat$Loc3,predicted=P)
## predicted
## acutal FALSE TRUE
## FALSE 387 1113
## TRUE 5 495
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(wifiLocDat$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
## Total cases that are not NA: 2000
## Correct predictions (accuracy): 882(44.1%)
## TP, TN, FP, FN, P, N: 495 387 1113 5 500 1500
## TPR (sensitivity)=TP/P: 99%
## TNR (specificity)=TN/N: 25.8%
## PPV (precision)=TP/(TP+FP): 30.8%
## FDR (false discovery)=1-PPV: 69.2%
## FPR =FP/N=1-TNR: 74.2%
With 1million cost, the runtime is the longest. Out of all the models, this has the fewest support vectors. This model is reducing the number of false negatives, but at the expense of increasing false positives. In a medical scenario, the model ensures that everyone that needs a treatment actually gets the treatment. At the same time, it also treats many others that do not need the treatment, which is ok if the unneeded treatment does not cause harm. It will only skip someone if it is very confident that the person does not need it.
# tune cost by cross-validation:
set.seed(1)
tune.out = tune(svm, Loc3~., data=wifiLocDat, kernel="linear", ranges=list(cost=c(0.001, 0.01, 0.1, 1, 5, 10, 100, 1000)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 100
##
## - best performance: 0.232
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.2500 0.02818589
## 2 1e-02 0.2500 0.02818589
## 3 1e-01 0.2505 0.02681728
## 4 1e+00 0.2345 0.04412419
## 5 5e+00 0.2325 0.04250817
## 6 1e+01 0.2325 0.04250817
## 7 1e+02 0.2320 0.04353798
## 8 1e+03 0.2525 0.05608773
The tune function indicates a cost of 100 is best.
# denser grid around minimum:
set.seed(1)
tune.out.1 = tune(svm, Loc3~., data=wifiLocDat, kernel="linear", ranges=list(cost=seq(14,15,0.1)))
summary(tune.out.1)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 14.3
##
## - best performance: 0.232
##
## - Detailed performance results:
## cost error dispersion
## 1 14.0 0.2325 0.04250817
## 2 14.1 0.2325 0.04250817
## 3 14.2 0.2325 0.04250817
## 4 14.3 0.2320 0.04353798
## 5 14.4 0.2320 0.04353798
## 6 14.5 0.2320 0.04353798
## 7 14.6 0.2320 0.04353798
## 8 14.7 0.2320 0.04353798
## 9 14.8 0.2320 0.04353798
## 10 14.9 0.2320 0.04353798
## 11 15.0 0.2320 0.04353798
After several tries, the cost at between 14.3 and 100 appears to have the best error value.
# best model:
bestmod = tune.out.1$best.model
summary(bestmod)
##
## Call:
## best.tune(method = svm, train.x = Loc3 ~ ., data = wifiLocDat, ranges = list(cost = seq(14,
## 15, 0.1)), kernel = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 14.3
##
## Number of Support Vectors: 999
##
## ( 500 499 )
##
##
## Number of Classes: 2
##
## Levels:
## FALSE TRUE
P = predict(bestmod)
tab = table(acutal=wifiLocDat$Loc3,predicted=P)
tab
## predicted
## acutal FALSE TRUE
## FALSE 1435 65
## TRUE 371 129
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(wifiLocDat$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
## Total cases that are not NA: 2000
## Correct predictions (accuracy): 1564(78.2%)
## TP, TN, FP, FN, P, N: 129 1435 65 371 500 1500
## TPR (sensitivity)=TP/P: 25.8%
## TNR (specificity)=TN/N: 95.7%
## PPV (precision)=TP/(TP+FP): 66.5%
## FDR (false discovery)=1-PPV: 33.5%
## FPR =FP/N=1-TNR: 4.33%
#repeat with train/test sets
cost = NULL
accu = NULL
for( i in 1:10 ){
train=sort(sample(nrow(wifiLocDat),floor(0.8*nrow(wifiLocDat))))
#split data into training and test sets
dtrain = subset(wifiLocDat[train,])
dtest = subset(wifiLocDat[-train,])
dtune = tune(svm, Loc3~., data=dtrain, kernel="linear", ranges=list(cost=c(0.001,0.01,0.1,1,2,5,10,20,30)))
bestmod = dtune$best.model
P = predict(bestmod,dtest)
tab = table(acutal=dtest$Loc3,predicted=P)
cost = c(cost,bestmod$cost)
accu = c(accu,sum(diag(tab))/sum(tab))
}
sort(unique(cost))
## [1] 1e-03 1e+00 2e+00 5e+00 1e+01 2e+01
mean(accu)
## [1] 0.76125
plot(cost,accu)
The optimal cost varies from 0.001 to 10, while the accuracy varies from 0.72 to 0.78, which is a fairly close range.
Fit random forest classifier on the entire WiFi localization dataset with default parameters. Calculate resulting misclassification error as reported by the confusion matrix in random forest output. Explain why error reported in random forest confusion matrix represents estimated test (as opposed to train) error of the procedure. Compare resulting test error to that for support vector classifier obtained above and discuss results of such comparison.
The random forest $predicted error is a test error, because the data it is predicting was not used in making the classifier. In another word, it is the predicted values of the input data based on out-of-bag samples.
features = wifiLocDat[,-which(names(wifiLocDat) %in% c("Loc3"))]
class = wifiLocDat$Loc3
# Fit random forest to train data, obtain test error:
rfRes = randomForest(features,class)
tab = table(actual=class,predicted=rfRes$predicted)
accu = sum(diag(tab))/sum(tab)
tab
## predicted
## actual FALSE TRUE
## FALSE 1486 14
## TRUE 13 487
accu
## [1] 0.9865
P = rfRes$predicted
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(wifiLocDat$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
## Total cases that are not NA: 2000
## Correct predictions (accuracy): 1973(98.6%)
## TP, TN, FP, FN, P, N: 487 1486 14 13 500 1500
## TPR (sensitivity)=TP/P: 97.4%
## TNR (specificity)=TN/N: 99.1%
## PPV (precision)=TP/(TP+FP): 97.2%
## FDR (false discovery)=1-PPV: 2.79%
## FPR =FP/N=1-TNR: 0.933%
Compared to svm, random forest has much better accuracy at 98%. It also has very good TPR and TNR. The differences suggest the data is not linearly separable, so the linear svm has poor performance.
Use convenience wrapper tune.knn provided by the library e1071 on the entire dataset to determine optimal value for the number of the nearest neighbors ‘k’ to be used in KNN classifier. Consider our observations from week 9 problem set when choosing range of values of k to be evaluated by tune.knn. Setup resampling procedure similar to that used above for support vector classifier that will repeatedly: a) split WiFi localization dataset into training and test, b) use tune.knn on training data to determine optimal k, and c) use k estimated by tune.knn to make KNN classifications on test data. Report and discuss distributions of test errors from this procedure and selected values of k, compare them to those obtained for random forest and support vector classifier above.
#knn on entire dataset
features = wifiLocDat[,-which(names(wifiLocDat) %in% c("Loc3"))]
class = wifiLocDat$Loc3
set.seed(1)
ktune = tune.knn(y=class, x=features, k = 1:20)
summary(ktune)
##
## Parameter tuning of 'knn.wrapper':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## k
## 3
##
## - best performance: 0.013
##
## - Detailed performance results:
## k error dispersion
## 1 1 0.0135 0.008514693
## 2 2 0.0160 0.006992059
## 3 3 0.0130 0.006749486
## 4 4 0.0180 0.005374838
## 5 5 0.0145 0.007245688
## 6 6 0.0155 0.006851602
## 7 7 0.0165 0.005797509
## 8 8 0.0180 0.005868939
## 9 9 0.0185 0.005797509
## 10 10 0.0185 0.007090682
## 11 11 0.0180 0.006749486
## 12 12 0.0190 0.006992059
## 13 13 0.0170 0.006749486
## 14 14 0.0190 0.004594683
## 15 15 0.0180 0.005374838
## 16 16 0.0170 0.005868939
## 17 17 0.0185 0.005296750
## 18 18 0.0180 0.006324555
## 19 19 0.0185 0.006258328
## 20 20 0.0175 0.005400617
Using the entire dataset, k=3 gives the best error.
#knn on repeated train/test sets
bestk = NULL
accu = NULL
for( i in 1:20 ){
train=sort(sample(nrow(wifiLocDat),floor(0.8*nrow(wifiLocDat))))
#split data into training and test sets
dtrain = subset(wifiLocDat[train,])
classTrain = dtrain[,which(names(wifiLocDat) %in% c("Loc3"))]
featTrain = dtrain[,-which(names(wifiLocDat) %in% c("Loc3"))]
dtest = subset(wifiLocDat[-train,])
classTest = dtest[,which(names(wifiLocDat) %in% c("Loc3"))]
featTest = dtest[,-which(names(wifiLocDat) %in% c("Loc3"))]
ktune = tune.knn(y=classTrain, x=featTrain, k = 1:20)
knnRes = knn(featTrain,featTest,classTrain,k=ktune$best.parameters$k)
tab = table(classTest,knnRes)
bestk = c(bestk,ktune$best.parameters$k)
accu = c(accu,sum(diag(tab))/sum(tab))
}
#check last run
P_bi = ifelse(knnRes==TRUE,1,0)
Loc3_bi = ifelse(classTest==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
## Total cases that are not NA: 400
## Correct predictions (accuracy): 395(98.8%)
## TP, TN, FP, FN, P, N: 93 302 4 1 94 306
## TPR (sensitivity)=TP/P: 98.9%
## TNR (specificity)=TN/N: 98.7%
## PPV (precision)=TP/(TP+FP): 95.9%
## FDR (false discovery)=1-PPV: 4.12%
## FPR =FP/N=1-TNR: 1.31%
sort(unique(bestk))
## [1] 1 3 4 5 6 12
mean(accu)
## [1] 0.985375
plot(bestk,accu)
The best k values varies from 1 to 5, but 1 has the most occurrence. The accuracies are in a narrow margin of 0.975 to 0.993. These values are good and comparable to random forest.
Plot SVM model fit to the WiFi localization dataset using (for the ease of plotting) only the first and the second attributes as predictor variables, kernel="radial", cost=10 and gamma=5 (see ISLR Ch.9.6.2 for an example of that done with a simulated dataset). You should be able to see in the resulting plot the magenta-cyan (or, in more recent versions of e1071 – yellow-brown) classification boundary as computed by this model. Produce the same kinds of plots using 0.5 and 50 as values of gamma also. Compare classification boundaries between these three plots and describe how they are impacted by the change in the value of gamma. Can you trace it back to the role of gamma in the equation introducing it with the radial kernel in ISLR?
#svm
c = 10
g = 0.5
svmfit = svm(Loc3~WiFi1+WiFi2,data=wifiLocDat,kernel="radial",cost=c,scale=FALSE,gamma=g)
summary(svmfit)
##
## Call:
## svm(formula = Loc3 ~ WiFi1 + WiFi2, data = wifiLocDat, kernel = "radial",
## cost = c, gamma = g, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 10
##
## Number of Support Vectors: 538
##
## ( 340 198 )
##
##
## Number of Classes: 2
##
## Levels:
## FALSE TRUE
plot(svmfit,wifiLocDat,WiFi1~WiFi2)
P = predict(svmfit)
table(acutal=wifiLocDat$Loc3,predicted=P)
## predicted
## acutal FALSE TRUE
## FALSE 1453 47
## TRUE 13 487
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(wifiLocDat$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
## Total cases that are not NA: 2000
## Correct predictions (accuracy): 1940(97%)
## TP, TN, FP, FN, P, N: 487 1453 47 13 500 1500
## TPR (sensitivity)=TP/P: 97.4%
## TNR (specificity)=TN/N: 96.9%
## PPV (precision)=TP/(TP+FP): 91.2%
## FDR (false discovery)=1-PPV: 8.8%
## FPR =FP/N=1-TNR: 3.13%
The radial kernel produces much better results than the linear kernel from the previous problem. The accuracy, TPR and TNR are very high. Visually, we can also see the the predicted true region, which contains most of the red (actual true) points
#svm
c = 10
g = 5
svmfit = svm(Loc3~WiFi1+WiFi2,data=wifiLocDat,kernel="radial",cost=c,scale=FALSE,gamma=g)
summary(svmfit)
##
## Call:
## svm(formula = Loc3 ~ WiFi1 + WiFi2, data = wifiLocDat, kernel = "radial",
## cost = c, gamma = g, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 10
##
## Number of Support Vectors: 632
##
## ( 429 203 )
##
##
## Number of Classes: 2
##
## Levels:
## FALSE TRUE
plot(svmfit,wifiLocDat,WiFi1~WiFi2)
P = predict(svmfit)
table(acutal=wifiLocDat$Loc3,predicted=P)
## predicted
## acutal FALSE TRUE
## FALSE 1461 39
## TRUE 20 480
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(wifiLocDat$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
## Total cases that are not NA: 2000
## Correct predictions (accuracy): 1941(97%)
## TP, TN, FP, FN, P, N: 480 1461 39 20 500 1500
## TPR (sensitivity)=TP/P: 96%
## TNR (specificity)=TN/N: 97.4%
## PPV (precision)=TP/(TP+FP): 92.5%
## FDR (false discovery)=1-PPV: 7.51%
## FPR =FP/N=1-TNR: 2.6%
With a gamma of 5, the accuracy, TPR and TNR are comparable to the gamma 0.5 model. Yet, visually the predicted true region appears smaller than the gamma 0.5 model.
#svm
c = 10
g = 50
svmfit = svm(Loc3~WiFi1+WiFi2,data=wifiLocDat,kernel="radial",cost=c,scale=FALSE,gamma=g)
summary(svmfit)
##
## Call:
## svm(formula = Loc3 ~ WiFi1 + WiFi2, data = wifiLocDat, kernel = "radial",
## cost = c, gamma = g, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 10
##
## Number of Support Vectors: 629
##
## ( 429 200 )
##
##
## Number of Classes: 2
##
## Levels:
## FALSE TRUE
plot(svmfit,wifiLocDat,WiFi1~WiFi2)
P = predict(svmfit)
table(acutal=wifiLocDat$Loc3,predicted=P)
## predicted
## acutal FALSE TRUE
## FALSE 1461 39
## TRUE 20 480
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(wifiLocDat$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
## Total cases that are not NA: 2000
## Correct predictions (accuracy): 1941(97%)
## TP, TN, FP, FN, P, N: 480 1461 39 20 500 1500
## TPR (sensitivity)=TP/P: 96%
## TNR (specificity)=TN/N: 97.4%
## PPV (precision)=TP/(TP+FP): 92.5%
## FDR (false discovery)=1-PPV: 7.51%
## FPR =FP/N=1-TNR: 2.6%
plot(svmfit,wifiLocDat,WiFi1~WiFi2,xlim=c(-58.1,-57.9),ylim=c(-50.1,-49.9))
Results from the gamma=50 model looks nearly identical to the results from the gamma=5 model. However, the predicted true regions are not visible in the plot, even when zoomed into a single point. So, it appears that a high gamma reduces the true regions to essentially to the data points themselves. The reduction in region is evident in the gamma=5 model, and most pronounced in the gamma=50 model.
Similar to how it was done above for support vector classifier (and KNN), set up a resampling process that will repeatedly: a) split the entire dataset (using all attributes as predictors) into training and test datasets, b) use tune function to determine optimal values of cost and gamma and c) calculate test error using these values of cost and gamma. Consider what you have learned above about the effects of the parameters cost and gamma to decide on the starting ranges of their values to be evaluated by tune. Additionally, experiment with different sets of their values and discuss in your solution the results of it and how you would go about selecting those ranges starting from scratch. Present resulting test error graphically, compare it to that of support vector classifier (with linear kernel), random forest and KNN classifiers obtained above and discuss results of these comparisons.
# tune cost by cross-validation:
set.seed(1)
tune.out = tune(svm, Loc3~WiFi1+WiFi2, data=wifiLocDat, kernel="radial", ranges=list(cost=c(0.001, 0.01, 0.1, 1, 5, 10, 100, 1000),gamma=c(0.5,5,50)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost gamma
## 10 0.5
##
## - best performance: 0.0375
##
## - Detailed performance results:
## cost gamma error dispersion
## 1 1e-03 0.5 0.2500 0.028185891
## 2 1e-02 0.5 0.2500 0.028185891
## 3 1e-01 0.5 0.0560 0.012649111
## 4 1e+00 0.5 0.0425 0.008897565
## 5 5e+00 0.5 0.0390 0.006992059
## 6 1e+01 0.5 0.0375 0.005892557
## 7 1e+02 0.5 0.0410 0.006582806
## 8 1e+03 0.5 0.0425 0.007905694
## 9 1e-03 5.0 0.2500 0.028185891
## 10 1e-02 5.0 0.2370 0.028303906
## 11 1e-01 5.0 0.0455 0.006433420
## 12 1e+00 5.0 0.0410 0.005676462
## 13 5e+00 5.0 0.0420 0.006324555
## 14 1e+01 5.0 0.0420 0.006749486
## 15 1e+02 5.0 0.0390 0.006146363
## 16 1e+03 5.0 0.0435 0.008181958
## 17 1e-03 50.0 0.2500 0.028185891
## 18 1e-02 50.0 0.2500 0.028185891
## 19 1e-01 50.0 0.1240 0.014681810
## 20 1e+00 50.0 0.0435 0.009143911
## 21 5e+00 50.0 0.0485 0.010013879
## 22 1e+01 50.0 0.0505 0.010124228
## 23 1e+02 50.0 0.0510 0.010219806
## 24 1e+03 50.0 0.0510 0.010219806
#On the full data set without splits, the best cost is 10 and best gamma is 0.5
# denser grid around minimum:
set.seed(1)
tune.out.1 = tune(svm, Loc3~WiFi1+WiFi2, data=wifiLocDat, kernel="radial", ranges=list(cost=c(9.6,9.7,9.8,9.9,10,10.1),gamma=c(0.3,0.4,0.5,0.6,0.7)))
summary(tune.out.1)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost gamma
## 9.8 0.5
##
## - best performance: 0.0375
##
## - Detailed performance results:
## cost gamma error dispersion
## 1 9.6 0.3 0.0430 0.009189366
## 2 9.7 0.3 0.0435 0.008834906
## 3 9.8 0.3 0.0435 0.008834906
## 4 9.9 0.3 0.0435 0.008834906
## 5 10.0 0.3 0.0430 0.009189366
## 6 10.1 0.3 0.0415 0.008181958
## 7 9.6 0.4 0.0395 0.008644202
## 8 9.7 0.4 0.0395 0.008644202
## 9 9.8 0.4 0.0395 0.008644202
## 10 9.9 0.4 0.0395 0.008644202
## 11 10.0 0.4 0.0395 0.008644202
## 12 10.1 0.4 0.0395 0.008644202
## 13 9.6 0.5 0.0380 0.006324555
## 14 9.7 0.5 0.0380 0.006324555
## 15 9.8 0.5 0.0375 0.005892557
## 16 9.9 0.5 0.0375 0.005892557
## 17 10.0 0.5 0.0375 0.005892557
## 18 10.1 0.5 0.0375 0.005892557
## 19 9.6 0.6 0.0385 0.004116363
## 20 9.7 0.6 0.0385 0.004116363
## 21 9.8 0.6 0.0385 0.004116363
## 22 9.9 0.6 0.0385 0.004116363
## 23 10.0 0.6 0.0385 0.004116363
## 24 10.1 0.6 0.0385 0.004116363
## 25 9.6 0.7 0.0380 0.004830459
## 26 9.7 0.7 0.0380 0.004830459
## 27 9.8 0.7 0.0380 0.004830459
## 28 9.9 0.7 0.0380 0.004830459
## 29 10.0 0.7 0.0380 0.004830459
## 30 10.1 0.7 0.0380 0.004830459
With a denser grid, the best cost is 9.8 and best gamma is 0.5. We can try to set the ranges near these values.
# best model:
bestmod = tune.out.1$best.model
summary(bestmod)
##
## Call:
## best.tune(method = svm, train.x = Loc3 ~ WiFi1 + WiFi2, data = wifiLocDat,
## ranges = list(cost = c(9.6, 9.7, 9.8, 9.9, 10, 10.1), gamma = c(0.3,
## 0.4, 0.5, 0.6, 0.7)), kernel = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 9.8
##
## Number of Support Vectors: 264
##
## ( 134 130 )
##
##
## Number of Classes: 2
##
## Levels:
## FALSE TRUE
P = predict(bestmod)
tab = table(acutal=wifiLocDat$Loc3,predicted=P)
tab
## predicted
## acutal FALSE TRUE
## FALSE 1450 50
## TRUE 23 477
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(wifiLocDat$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
## Total cases that are not NA: 2000
## Correct predictions (accuracy): 1927(96.4%)
## TP, TN, FP, FN, P, N: 477 1450 50 23 500 1500
## TPR (sensitivity)=TP/P: 95.4%
## TNR (specificity)=TN/N: 96.7%
## PPV (precision)=TP/(TP+FP): 90.5%
## FDR (false discovery)=1-PPV: 9.49%
## FPR =FP/N=1-TNR: 3.33%
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(wifiLocDat$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
## Total cases that are not NA: 2000
## Correct predictions (accuracy): 1927(96.4%)
## TP, TN, FP, FN, P, N: 477 1450 50 23 500 1500
## TPR (sensitivity)=TP/P: 95.4%
## TNR (specificity)=TN/N: 96.7%
## PPV (precision)=TP/(TP+FP): 90.5%
## FDR (false discovery)=1-PPV: 9.49%
## FPR =FP/N=1-TNR: 3.33%
table(actual=P_bi,predicted=P_bi)
## predicted
## actual 0 1
## 0 1473 0
## 1 0 527
# repeat with train/test sets
cost = NULL
gamma = NULL
accu = NULL
for( i in 1:10 ){
train=sort(sample(nrow(wifiLocDat),floor(0.8*nrow(wifiLocDat))))
#split data into training and test sets
dtrain = subset(wifiLocDat[train,])
dtest = subset(wifiLocDat[-train,])
dtune = tune(svm, Loc3~WiFi1+WiFi2, data=dtrain, kernel="radial", ranges=list(cost=c(0.3,1,3,9.8,27),gamma=c(0.04,0.5,1,5,25)))
bestmod = dtune$best.model
P = predict(bestmod,dtest)
tab = table(acutal=dtest$Loc3,predicted=P)
cost = c(cost,bestmod$cost)
gamma = c(gamma,bestmod$gamma)
accu = c(accu,sum(diag(tab))/sum(tab))
}
#check results from last run
summary(dtune)
summary(bestmod)
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(dtest$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
tab
plot(bestmod,wifiLocDat,WiFi1~WiFi2)
sort(unique(cost))
## [1] 1.0 3.0 9.8 27.0
sort(unique(gamma))
## [1] 0.5 1.0
mean(accu)
## [1] 0.96075
plot(cost,accu)
plot(gamma,accu)
Running 10 iterations of tune resulted in the optimal parameters above. The accuracies vary from 0.95 to 0.97. A plot of the model with these parameters are shown below. By just using 2 predictors, this model has an accuracy that is on par with the random forest and knn models.
#svm
c = 0.2
g = 1
svmfit = svm(Loc3~WiFi1+WiFi2,data=wifiLocDat,kernel="radial",cost=c,scale=FALSE,gamma=g)
summary(svmfit)
##
## Call:
## svm(formula = Loc3 ~ WiFi1 + WiFi2, data = wifiLocDat, kernel = "radial",
## cost = c, gamma = g, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.2
##
## Number of Support Vectors: 849
##
## ( 465 384 )
##
##
## Number of Classes: 2
##
## Levels:
## FALSE TRUE
plot(svmfit,wifiLocDat,WiFi1~WiFi2)
P = predict(svmfit)
table(acutal=wifiLocDat$Loc3,predicted=P)
## predicted
## acutal FALSE TRUE
## FALSE 1464 36
## TRUE 72 428
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(wifiLocDat$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
## Total cases that are not NA: 2000
## Correct predictions (accuracy): 1892(94.6%)
## TP, TN, FP, FN, P, N: 428 1464 36 72 500 1500
## TPR (sensitivity)=TP/P: 85.6%
## TNR (specificity)=TN/N: 97.6%
## PPV (precision)=TP/(TP+FP): 92.2%
## FDR (false discovery)=1-PPV: 7.76%
## FPR =FP/N=1-TNR: 2.4%
Repeat what was done above (plots of decision boundaries for various interesting values of tuning parameters and test error for their best values estimated from training data) using kernel="polynomial". Determine ranges of coef0, degree, cost and gamma to be evaluated by tune. Present and discuss resulting test error and how it compares to linear and radial kernels and those of random forest and KNN.
#svm on entire data set
c = 1
g = 1
d = 2
co = 1
svmfit = svm(Loc3~WiFi1+WiFi2,data=wifiLocDat,kernel="polynomial",cost=c,scale=FALSE,gamma=g,degree=d,coef0=co)
summary(svmfit)
plot(svmfit,wifiLocDat,WiFi1~WiFi2)
P = predict(svmfit)
table(acutal=wifiLocDat$Loc3,predicted=P)
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(wifiLocDat$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
set.seed(1)
train=sort(sample(nrow(wifiLocDat),floor(0.8*nrow(wifiLocDat))))
#split data into training and test sets
dtrain = subset(wifiLocDat[train,])
dtest = subset(wifiLocDat[-train,])
dtune = tune(svm, Loc3~WiFi1+WiFi2, data=dtrain, kernel="polynomial", ranges=list(gamma=c(5,50),cost=c(1,10),degree=c(2,3),coef0=c(0,1)))
bestmod = dtune$best.model
P = predict(bestmod,dtest)
tab = table(acutal=dtest$Loc3,predicted=P)
summary(dtune)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## gamma cost degree coef0
## 50 10 2 1
##
## - best performance: 0.04125
##
## - Detailed performance results:
## gamma cost degree coef0 error dispersion
## 1 5 1 2 0 0.118125 0.02154010
## 2 50 1 2 0 0.118750 0.02165064
## 3 5 10 2 0 0.118125 0.02154010
## 4 50 10 2 0 0.113125 0.01625801
## 5 5 1 3 0 0.247500 0.03463600
## 6 50 1 3 0 0.377500 0.13027748
## 7 5 10 3 0 0.247500 0.03463600
## 8 50 10 3 0 0.350000 0.11380417
## 9 5 1 2 1 0.043750 0.01214782
## 10 50 1 2 1 0.042500 0.01343710
## 11 5 10 2 1 0.042500 0.01343710
## 12 50 10 2 1 0.041250 0.01148671
## 13 5 1 3 1 0.042500 0.01012080
## 14 50 1 3 1 0.202500 0.11972190
## 15 5 10 3 1 0.042500 0.01343710
## 16 50 10 3 1 0.441875 0.07247186
summary(bestmod)
##
## Call:
## best.tune(method = svm, train.x = Loc3 ~ WiFi1 + WiFi2, data = dtrain,
## ranges = list(gamma = c(5, 50), cost = c(1, 10), degree = c(2,
## 3), coef0 = c(0, 1)), kernel = "polynomial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 10
## degree: 2
## coef.0: 1
##
## Number of Support Vectors: 197
##
## ( 98 99 )
##
##
## Number of Classes: 2
##
## Levels:
## FALSE TRUE
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(dtest$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
## Total cases that are not NA: 400
## Correct predictions (accuracy): 380(95%)
## TP, TN, FP, FN, P, N: 92 288 8 12 104 296
## TPR (sensitivity)=TP/P: 88.5%
## TNR (specificity)=TN/N: 97.3%
## PPV (precision)=TP/(TP+FP): 92%
## FDR (false discovery)=1-PPV: 8%
## FPR =FP/N=1-TNR: 2.7%
tab
## predicted
## acutal FALSE TRUE
## FALSE 288 8
## TRUE 12 92
plot(bestmod,wifiLocDat,WiFi1~WiFi2)
The optimal parameters are listed above. The accuracy is about 95%, on par with random forest, knn, and radial kernal svm.
##This takes too long to run
set.seed(1)
train=sort(sample(nrow(wifiLocDat),floor(0.8*nrow(wifiLocDat))))
#split data into training and test sets
dtrain = subset(wifiLocDat[train,])
dtest = subset(wifiLocDat[-train,])
dtune = tune(svm, Loc3~WiFi1+WiFi2, data=dtrain, kernel="polynomial", ranges=list(gamma=c(50,500,5000),cost=c(10,100),degree=c(1,2,3),coef0=c(0.001,0.01,0.1,1)))
bestmod = dtune$best.model
P = predict(bestmod,dtest)
tab = table(acutal=dtest$Loc3,predicted=P)
summary(dtune)
summary(bestmod)
P_bi = ifelse(P==TRUE,1,0)
Loc3_bi = ifelse(dtest$Loc3==TRUE,1,0)
assess.prediction(Loc3_bi,P_bi)
tab
plot(bestmod,wifiLocDat,WiFi1~WiFi2)